Take-home Exercise 3

This take home exercise aims to explain factors affecting resale prices of public housing in Singapore by building hedonic pricing models using appropriate GWR methods.

Genice Goh true
11-05-2021

1.0 Introduction

Housing price is affected by a variety of factors, including global factors such as the general economy of a country or property specific factors such as structural variables related to the property and locational variables related to the neighbourhood.

Hedonic pricing model is used to examine the effect of housing factors as on property price. However, this method fails to take into consideration that spatial autocorrelation and spatial heterogeneity exist in housing transaction data, which could lead to biased results (Anselin 1998). In view of this limitation, Geographical Weighted Regression (GWR) was introduced to calibrate hedonic price models for housing.

As such, we will be building hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore using appropriate GWR methods.

1.1 The Data

The data sets used for this analysis include:

1.2 Installing and Loading Packages

The packages used for this analysis include:

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'httr', 'units', 'matrixStats')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.0 Geospatial Data

2.1 Importing Geospatial Data

st_read() of sf package is used to import the geospatial data, which is in shapefile format.

mrt_sf <- st_read(dsn="data/geospatial",
               layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
bus_sf <- st_read(dsn="data/geospatial",
               layer="BusStop")
Reading layer `BusStop' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21
mpsz_sf <- st_read(dsn="data/geospatial",
               layer="MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

From the output message, we can see that:

2.2 Data Preprocessing

Before we visualise the geospatial data, we will need to conduct data preprocessing to ensure that there are no invalid geometries and missing values.

2.2.1 Invalid Geometries

length(which(st_is_valid(mrt_sf) == FALSE))
[1] 0
length(which(st_is_valid(bus_sf) == FALSE))
[1] 0
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 9

There are no invalid geometries in both mrt_sfand bus_sf data frames while the mpsz_sf data frame contains 9 invalid geometries. We will now proceed to remove the invalid geometries in the mpsz_sf data frame.

mpsz_sf <- st_make_valid(mpsz_sf)

# Check again for invalid geometries
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0

2.2.2 Missing Values

mrt_sf[rowSums(is.na(mrt_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
bus_sf[rowSums(is.na(bus_sf))!=0,]
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
    BUS_STOP_N BUS_ROOF_N LOC_DESC                  geometry
264      47201        UNK     <NA> POINT (22616.75 47793.68)
mpsz_sf[rowSums(is.na(mpsz_sf))!=0,]
Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
 [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
 [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
[13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
<0 rows> (or 0-length row.names)

We can see that there is 1 observation with missing values in the bus_sf data frame. We will remove it because another bus stop of the same BUS_ROOF_N is found after further data exploration.

bus_sf <- bus_sf[!rowSums(is.na(bus_sf))!=0,]

# Check again for missing values
bus_sf[rowSums(is.na(bus_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

2.2.3 Duplicated Values

mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 19 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22949.03 ymin: 28735.78 xmax: 42362.71 ymax: 46418.56
Projected CRS: SVY21
First 10 features:
    OBJECTID                STN_NAME STN_NO                  geometry
33        33  ANG MO KIO MRT STATION   NS16 POINT (29807.27 39105.77)
97       105  MACPHERSON MRT STATION   DT26  POINT (34235.8 34256.43)
103      111       BUGIS MRT STATION   EW12 POINT (30491.56 31424.35)
114      122        EXPO MRT STATION   DT35 POINT (42362.71 35285.67)
116      124 BUONA VISTA MRT STATION   CC22 POINT (23251.76 32090.77)
121      129  MARINA BAY MRT STATION    CE2 POINT (30423.43 28735.78)
124      132   CHINATOWN MRT STATION   DT19 POINT (29238.97 29686.54)
134      142    TAMPINES MRT STATION   DT32 POINT (40213.03 37463.37)
140      148   SERANGOON MRT STATION   NE12 POINT (32480.07 36869.39)
144      152  PAYA LEBAR MRT STATION    CC9 POINT (34550.58 33300.34)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 13 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13488.02 ymin: 29969.22 xmax: 44144.57 ymax: 47806.75
Projected CRS: SVY21
First 10 features:
     BUS_STOP_N BUS_ROOF_N             LOC_DESC
350       58031        UNK      OPP CANBERRA DR
1569      51071        B21 MACRITCHIE RESERVOIR
2208      82221        B01               Blk 3A
2215      97079        B14  OPP ST. JOHN'S CRES
2392      62251        B03         BEF BLK 471B
2462      22501        B02             BLK 662A
2736      16119        NIL        TELETECH PARK
2976      58229       B01A          BLK 358A CP
3442      67421        NIL CHENG LIM STN EXIT B
3521      68091        B08         AFT BAKER ST
                      geometry
350   POINT (27089.69 47570.9)
1569 POINT (28305.37 36036.67)
2208 POINT (35308.74 33335.17)
2215 POINT (44144.57 38980.25)
2392 POINT (35500.36 39943.34)
2462 POINT (13488.02 35537.88)
2736 POINT (22318.89 29969.22)
2976 POINT (26094.86 47806.75)
3442 POINT (34741.77 42004.21)
3521 POINT (32038.84 43298.68)
mpsz_sf[duplicated(mpsz_sf$SUBZONE_C),]
Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
 [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
 [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
[13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
<0 rows> (or 0-length row.names)

We can observe that there are 19 and 13 duplicated observations in the mrt_sf and bus_sf data frames respectively, while there are none in the mpsz_sf data frame. We will proceed to remove the duplicated observations.

mrt_sf <- mrt_sf[!duplicated(mrt_sf$STN_NAME),]
bus_sf <- bus_sf[!duplicated(bus_sf$BUS_STOP_N),]

# Check again for duplicates
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

2.3 Verify Coordinate Reference System

We will first need to retrieve the coordinate reference system for verification. st_crs() of sf package is used to do this.

st_crs(mrt_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

From the output messages, we can observe that the EPSG code for the data frames is currently 9001. This is wrong because the EPSG code of projection coordinate system SVY21 is supposed to be 3414, instead of 9001.

st_set_crs() of sf package is therefore used to assign the correct EPSG code for the data frames.

mrt_sf <- st_set_crs(mrt_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
mpsz_sf <- st_set_crs(mpsz_sf, 3414)

We will now use st_crs() of sf package to retrieve the coordinate reference system again.

st_crs(mrt_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

The EPSG code is now correctly assigned for all sf data frames!!

2.4 Visualising Geospatial Data

It is useful to plot a map to visualise the geospatial data, so that we can easily get a preliminary look at the spatial patterns.

tmap_mode('view')

tm_shape(mrt_sf) +
  tm_dots(col="red") +
tm_shape(bus_sf) + 
  tm_dots(col="blue")

If we look closely, we can see that there are 5 bus stops outside of Singapore’s boundary (46211, 46219, 46239, 46609, 47701). As we are able to travel to and fro Johor Bahru with specific buses, there are designated bus stops located at Johor Bahru.

As such, we should remove these bus stops before proceeding with our analysis.

2.5 Further Data Preprocessing

In this section, we will proceed to remove the bus stops identified earlier.

2.5.1 Inspect the specific bus stops

omit_bus <- list(46211, 46219, 46239, 46609, 47701)
bus_sf[bus_sf$BUS_STOP_N %in% omit_bus,]
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 17969.14 ymin: 49173.42 xmax: 20723.24 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
     BUS_STOP_N BUS_ROOF_N            LOC_DESC
765       47701        NIL          JB SENTRAL
1532      46239         NA          LARKIN TER
2257      46609         NA     KOTARAYA II TER
2269      46211        NIL JOHOR BAHRU CHECKPT
4346      46219        NIL JOHOR BAHRU CHECKPT
                      geometry
765  POINT (20370.54 49173.42)
1532 POINT (17969.14 52983.82)
2257  POINT (20025.46 49261.6)
2269 POINT (20623.18 49199.99)
4346 POINT (20723.24 49200.05)

We can observe that the location descriptions of these bus stops indeed indicate that they are situated in Johor Bahru. We will therefore need to remove them.

bus_sf <- bus_sf[!bus_sf$BUS_STOP_N %in% omit_bus,]

# Check again if we have removed successfully
bus_sf[bus_sf$BUS_STOP_N %in% omit_bus,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

3.0 Aspatial Data

3.1 Obtaining Aspatial Data

We will be making use of the package onemapsgapi to query the required data sets from OneMap API. We will then save these data sets in CSV format.

library(onemapsgapi)

token <- "<insert your token here>"

data <- list("eldercare", "hawkercentre", "relaxsg", "supermarkets", "kindergartens", "childcare")

for (d in data) {
  df <- get_theme(token, d)
  write_csv(df, str_replace("data/aspatial/df.csv", "df", d))
}

3.2 Importing Aspatial Data

read_csv() of readr package is used to import the CSV files. The outputs are tibble data frames.

eldercare = read_csv("data/aspatial/eldercare.csv")
hawker = read_csv("data/aspatial/hawkercentre.csv")
park = read_csv("data/aspatial/relaxsg.csv")
supermarket = read_csv("data/aspatial/supermarkets.csv")
kindergarten = read_csv("data/aspatial/kindergartens.csv")
childcare = read_csv("data/aspatial/childcare.csv")
school = read_csv("data/aspatial/schools.csv")
mall = read_csv("data/aspatial/shopping-mall.csv")
flat_resale = read_csv("data/aspatial/resale-flat-prices.csv")

It is also important to understand the data that we are working with. glimpse() of dplyr package is therefore used to perform exploratory data analysis.

glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
glimpse(school)
glimpse(mall)
glimpse(flat_resale)

3.3 Data Preparation

We will proceed to prepare the data such that they can be used later for the preparation of independent variables. The steps taken include:

# remove redundant columns for eldercare dataset
eldercare <- eldercare %>%
  select(c(1, 4:5))

# remove redundant columns for hawker dataset
hawker <- hawker %>%
  select(c(1, 14:15))

# remove redundant columns for park dataset
park <- park %>%
  select(c(1, 8:9))

# remove redundant columns for supermarket dataset
supermarket <- supermarket %>%
  select(c(1, 7:8))

# remove redundant columns for kindergarten dataset
kindergarten <- kindergarten %>%
  select(c(1, 5:6))

# remove redundant columns for childcare dataset
childcare <- childcare %>%
  select(c(1, 5:6))

# filter out primary schools; remove redundant columns 
prischool <- school %>%
  filter(mainlevel_code == "PRIMARY") %>% 
  select(c(1, 3:4, 25,27))

# filter out 4-room flats with transaction period from 01-Jan-19 to 30-Sep-20
flat_resale <- flat_resale %>% 
  filter(flat_type == "4 ROOM") %>%
  filter(month >= "2019-01" & month < "2020-10")

3.4 Data Preprocessing

We will need to conduct data preprocessing to ensure that there are no NA values in all data frames.

3.4.1 NA Values

eldercare[rowSums(is.na(eldercare))!=0,]
hawker[rowSums(is.na(hawker))!=0,]
park[rowSums(is.na(park))!=0,]
supermarket[rowSums(is.na(supermarket))!=0,]
kindergarten[rowSums(is.na(kindergarten))!=0,]
childcare[rowSums(is.na(childcare))!=0,]
prischool[rowSums(is.na(prischool))!=0,]
mall[rowSums(is.na(mall))!=0,]
flat_resale[rowSums(is.na(flat_resale))!=0,]

There are no NA values in all the data frames.

3.5 Geocoding for Aspatial Data

After exploratory data analysis performed earlier, it is found that prischool and flat_resale data frames do not have Lat and Lng columns. These columns are required for the preparation of the independent variables later. We therefore need to use OneMap API to geocode the coordinates columns.

3.5.1 Create Address Column

We first need to create an address column for the flat_resale data frame since the address data is currently split into 2 columns block and street name respectively.

unite() of dpylr package is used to concatenate the 2 columns.

flat_resale <- flat_resale %>%
  unite("address", block:street_name, sep= " ")

3.5.2 Rename ST. to SAINT

We alo observe that some addresses in the the flat_resale data frame start with “ST.”. It is found later that this will pose problems during geocoding. As a result, we will need to rename “ST.” to its full version “SAINT”.

flat_resale$address <- gsub("ST\\.", "SAINT", flat_resale$address)

3.5.3 Geocoding Function

In this function, the input variable address is passed in as a search value in the query to the API. The output is then converted to a data frame, where we only choose to retain the Latitude and Longitude columns. These 2 columns are returned as the output.

geocode <- function(address) {

  url <- "https://developers.onemap.sg/commonapi/search"

  query <- list("searchVal" = address, "returnGeom" = "Y",
                "getAddrDetails" = "N", "pageNum" = "1")

  res <- GET(url, query = query, verbose())

  output <- content(res) %>%
    as.data.frame %>%
    select(results.LATITUDE, results.LONGITUDE)

  return(output)
}

3.5.4 Geocoding for Primary Schools

We will now loop through each row of the prischool data frame and implement the geocode function. The output is saved as 2 new columns Lat and Lng.

prischool$Lat <- 0
prischool$Lng <- 0

for (i in 1:nrow(prischool)) {
  output <- geocode(prischool$postal_code[i])
  
  prischool$Lat[i] <- output$results.LATITUDE
  prischool$Lng[i] <- output$results.LONGITUDE
}

3.5.5 Geocoding for Resale Flat Prices

We will now loop through each row of the flat_resale data frame and implement the geocode function. The output is saved as 2 new columns Lat and Lng.

flat_resale$Lat <- 0
flat_resale$Lng <- 0

for (i in 1:nrow(flat_resale)) {
  output <- geocode(flat_resale$address[i])
  
  flat_resale$Lat[i] <- output$results.LATITUDE
  flat_resale$Lng[i] <- output$results.LONGITUDE
}

3.6 Structural Factor Preparation

3.6.1 Floor Level

We need to conduct dummy coding on the storey_range variable for us to use it in model later. We will first look at the individual storey-range values for us to get a rough idea on the number of columns that will be produced.

unique(flat_resale$storey_range)

We observed that there are 17 storey range categories.

To conduct dummy coding, we will be using pivot_wider() of dplyr package to create duplicate variables representing every store-range. If the obeservation belongs to the storey-range, the value will be 1. The value will be 0 otherwise.

flat_resale <- flat_resale %>%
  pivot_wider(names_from = storey_range, values_from = storey_range, 
              values_fn = list(storey_range = ~1), values_fill = 0) 

3.6.2 Remaining Lease

We need to convert the remaining_lease column from string to numeric format to use in the model later. We will first split the string within the column and take the value(s) of the year and month. We will then calculate the remaining lease in years and replace the original value.

str_list <- str_split(flat_resale$remaining_lease, " ")

for (i in 1:length(str_list)) {
  if (length(unlist(str_list[i])) > 2) {
      year <- as.numeric(unlist(str_list[i])[1])
      month <- as.numeric(unlist(str_list[i])[3])
      flat_resale$remaining_lease[i] <- year + round(month/12, 2)
  }
  else {
    year <- as.numeric(unlist(str_list[i])[1])
    flat_resale$remaining_lease[i] <- year
  }
}

3.7 Locational Variable Preparation

3.7.1 Good Pri Sch Variable Preparation

We need to filter out good primary schools and save it as variable goodprischool for it to be used in the model. Good primary schools are defined to be Special Assistance Programme (SAP) primary schools or primary schools with the Gifted Education Programme (GEP).

gdprischool <- prischool %>%
  filter(sap_ind == "Yes" | gifted_ind == "Yes")

3.7.2 CBD Variable Preparation

The Central Business District (CBD) is in Downtown Core, located in the southwest part of Singapore. We will therefore take the coordinates of Downtown Core to be the coordinates of the CBD and convert it into a sf data frame.

lat <- 1.287953
long <- 103.851784

cbd_sf <- data.frame(lat, long) %>%
  st_as_sf(coords = c("long", "lat"), crs=4326) %>%
  st_transform(crs=3414)

3.7.3 Convert Datasets to sf

We need to convert the data sets into sf data frames for us to calculate the proximity distance matrices.

flat_resale_sf <- st_as_sf(flat_resale, 
                         coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

eldercare_sf <- st_as_sf(eldercare, 
                         coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

hawker_sf <- st_as_sf(hawker, 
                      coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

park_sf <- st_as_sf(park, 
                    coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

gdprischool_sf <- st_as_sf(gdprischool, 
                           coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

mall_sf <- st_as_sf(mall, 
                    coords = c("longitude", "latitude"), crs=4326) %>%
  st_transform(crs = 3414)

supermarket_sf <- st_as_sf(supermarket, 
                           coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

kindergarten_sf <- st_as_sf(kindergarten, 
                            coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

childcare_sf <- st_as_sf(childcare, 
                         coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

prischool_sf <- st_as_sf(prischool, 
                         coords = c("Lng", "Lat"), crs=4326) %>%
  st_transform(crs = 3414)

3.7.4 Proximity Distance Calculation

This function computes the distance to the nearest facility. st_distance() of sf package is used to compute the distance between all flats and the respective facilities. rowMins() of matrixStats is then used to take the shortest distance within each row in the output matrix. The values will be appended to the data frame as a new column.

proximity <- function(df1, df2, varname) {
  
  matrix <- st_distance(df1, df2) %>%
    drop_units()
  
  df1[,varname] <- rowMins(matrix)
  
  return(df1)
}

We will now implement the proximity function to the required variables.

flat_resale_sf <- proximity(flat_resale_sf, cbd_sf, "PROX_CBD") %>%
  proximity(., eldercare_sf, "PROX_EC") %>%
  proximity(., hawker_sf, "PROX_HA") %>%
  proximity(., mrt_sf, "PROX_MRT") %>%
  proximity(., park_sf, "PROX_PARK") %>%
  proximity(., gdprischool_sf, "PROX_GDPRI") %>%
  proximity(., mall_sf, "PROX_MALL") %>%
  proximity(., supermarket_sf, "PROX_SUP")

3.7.5 Facility Count within Radius Calculation

This function computes the facility count within a radius. Similarly, st_distance() of sf package is used to compute the distance between all flats and the respective facilities. rowSums() of dplyr is then used to count the observations with distances below the defined radius. The values will be appended to the data frame as a new column.

num_radius <- function(df1, df2, varname, radius) {
  
  matrix <- st_distance(df1, df2) %>%
    drop_units() %>%
    as.data.frame()
  
  df1[,varname] <- rowSums(matrix <= radius)
  
  return(df1)
}

We will now implement the num_radius function to the required variables.

flat_resale_sf <- num_radius(flat_resale_sf, kindergarten_sf, 
                             "NUM_KIN", 350) %>%
  num_radius(., childcare_sf, "NUM_CC", 350) %>%
  num_radius(., bus_sf, "NUM_STOP", 350) %>%
  num_radius(., prischool_sf, "NUM_PRI", 1000)

3.8 Saving the Dataset

Before saving the dataset, we will remove the redundant columns, rename the columns to shorter forms and relocate the price columns to the front of the data frame.

flat_resale_sf <- flat_resale_sf %>%
  select(5, 8, 9, 10:39) %>%
  rename("AREA_SQM" = "floor_area_sqm", "LEASE_YRS" = "remaining_lease", 
         "PRICE"= "resale_price") %>%
  relocate(`PRICE`)

We can now save the final flat resale price data set as a SHP file using st_write of sf package.

st_write(flat_resale_sf, "data/geospatial/resale-flat-prices-final.shp")

4.0 EDA

4.1 Import Full Dataset

st_read() of sf package is used to import the final data set, which is in shapefile format.

flat_resale_sf <- st_read(dsn="data/geospatial",
                          layer="resale-flat-prices-final")
Reading layer `resale-flat-prices-final' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-11-05-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 15853 features and 32 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11597.31 ymin: 28217.39 xmax: 42623.63 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM

4.2 EDA using statistical graphs

4.2.1 Dependent Variable

We can plot the distribution of PRICE using a histogram as shown below.

ggplot(data=flat_resale_sf, aes(x=`PRICE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

The figure above reveals a slightly right skewed distribution. This means that more HDB flats were transacted at relative lower prices.

Statistically, the skewed distribution can be normalised using log transformation. mutate() of dplyr package is used to derive a new variable called LOG_PRICE using a log transformation on the variable PRICE.

flat_resale_sf <- flat_resale_sf %>%
  mutate(`LOG_PRICE` = log(PRICE)) %>%
  relocate(`LOG_PRICE`, .after = `PRICE`)

We can now plot the distribution of LOG_PRICE.

ggplot(data=flat_resale_sf, aes(x=`LOG_PRICE`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

4.2.2 Independent Variables

It is found that the LEASE_YRS column is still in string format. We will convert it to numeric format for us to conduct EDA and to input into the model later.

flat_resale_sf$LEASE_YRS <- as.numeric(flat_resale_sf$LEASE_YRS)

We will now draw multiple histograms to show multiple distributions of the independent variables. This is done using ggarrange() of ggpubr package.

AREA_SQM <- ggplot(data=flat_resale_sf, aes(x= `AREA_SQM`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

LEASE_YRS <- ggplot(data=flat_resale_sf, aes(x= `LEASE_YRS`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_CBD <- ggplot(data=flat_resale_sf, aes(x= `PROX_CBD`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_EC <- ggplot(data=flat_resale_sf, aes(x= `PROX_EC`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_HAWKER <- ggplot(data=flat_resale_sf, aes(x= `PROX_HA`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_MRT <- ggplot(data=flat_resale_sf, aes(x= `PROX_MRT`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_PARK <- ggplot(data=flat_resale_sf, aes(x= `PROX_PARK`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_GDPRI <- ggplot(data=flat_resale_sf, aes(x= `PROX_GDPRI`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_MALL <- ggplot(data=flat_resale_sf, aes(x= `PROX_MALL`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

PROX_SUP <- ggplot(data=flat_resale_sf, aes(x= `PROX_SUP`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

NUM_KIN <- ggplot(data=flat_resale_sf, aes(x= `NUM_KIN`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

NUM_CC <- ggplot(data=flat_resale_sf, aes(x= `NUM_CC`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

NUM_STOP <- ggplot(data=flat_resale_sf, aes(x= `NUM_STOP`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

NUM_PRI <- ggplot(data=flat_resale_sf, aes(x= `NUM_PRI`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

ggarrange(AREA_SQM, LEASE_YRS, PROX_CBD, PROX_EC, PROX_HAWKER, PROX_MRT, PROX_PARK, PROX_GDPRI, PROX_MALL, PROX_SUP, NUM_KIN, NUM_CC, NUM_STOP, NUM_PRI, ncol = 3, nrow = 5)

We can observe that the distribution of the majority of the independent variables are right skewed, while variables such as LEASE_YRS and PROX_CBD are slightly left skewed.

4.3 EDA using statistical point map

We want to reveal the geospatial distribution flat resale prices (i.e. dependent variable) in Singapore using the tmap package.

tmap_mode("view")

tm_shape(mpsz_sf)+
  tm_polygons() +
tm_shape(flat_resale_sf) +  
  tm_dots(col = "PRICE",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(11,14))

From the map, we can observe that flats with higher resale prices are more concentrated in the central area of Singapore.

5.0 Hedonic Price Modelling

We will be building the hedonic pricing model using both multiple linear regression method and appropriate GWR methods. The independent variables we are looking at include:

5.1 Multiple Linear Regression Method

5.1.1 Visualising the relationships of the independent variables

Before building a multiple regression model, it is important to ensure that the independent variables used are not highly correlated to each other, which will cause the quality of the model to be compromised.

Correlation matrix is commonly used to visualise the relationships between the independent variables. corrplot package will be used to plot a scatterplot matrix of the relationship between the independent variables.

flat_resale <- flat_resale_sf %>%
  st_set_geometry(NULL)
corrplot(cor(flat_resale[, 3:33]), diag = FALSE, order = "AOE",
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

From the scatterplot matrix, it is clear that PROX_GDPRI is highly correlated to PROX_CBD. In view of this, it is wiser to only include either one of them in the subsequent model building. As a result, PROX_CBD is excluded in the subsequent model building.

5.1.2 Building the Hedonic Pricing Model

lm() is used to calibrate the multiple linear regression model.

flat_mlr <- lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 + X04.TO.06 + X07.TO.09 + X10.TO.12 + X13.TO.15 + X16.TO.18 + X19.TO.21 + X22.TO.24 + X25.TO.27 + X28.TO.30 + X31.TO.33 + X34.TO.36 + X37.TO.39 + X40.TO.42 + X43.TO.45 + X46.TO.48 + X49.TO.51 + PROX_EC + PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL  + PROX_SUP + NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data=flat_resale_sf)

summary(flat_mlr)

Call:
lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 + X04.TO.06 + 
    X07.TO.09 + X10.TO.12 + X13.TO.15 + X16.TO.18 + X19.TO.21 + 
    X22.TO.24 + X25.TO.27 + X28.TO.30 + X31.TO.33 + X34.TO.36 + 
    X37.TO.39 + X40.TO.42 + X43.TO.45 + X46.TO.48 + X49.TO.51 + 
    PROX_EC + PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP + 
    NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data = flat_resale_sf)

Residuals:
    Min      1Q  Median      3Q     Max 
-258828  -42665   -8125   35489  590059 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.041e+05  1.630e+04  18.652  < 2e-16 ***
AREA_SQM     1.867e+03  8.707e+01  21.443  < 2e-16 ***
LEASE_YRS    3.370e+03  5.329e+01  63.252  < 2e-16 ***
X01.TO.03   -8.243e+04  1.280e+04  -6.439 1.24e-10 ***
X04.TO.06   -6.182e+04  1.277e+04  -4.843 1.29e-06 ***
X07.TO.09   -4.653e+04  1.275e+04  -3.651 0.000262 ***
X10.TO.12   -3.759e+04  1.274e+04  -2.949 0.003190 ** 
X13.TO.15   -3.288e+04  1.277e+04  -2.575 0.010021 *  
X16.TO.18   -1.780e+04  1.292e+04  -1.378 0.168321    
X19.TO.21    1.227e+04  1.338e+04   0.917 0.359269    
X22.TO.24    8.800e+03  1.352e+04   0.651 0.515212    
X25.TO.27    7.018e+04  1.431e+04   4.905 9.46e-07 ***
X28.TO.30    1.634e+05  1.477e+04  11.062  < 2e-16 ***
X31.TO.33    1.656e+05  1.775e+04   9.331  < 2e-16 ***
X34.TO.36    1.690e+05  1.858e+04   9.093  < 2e-16 ***
X37.TO.39    2.475e+05  1.828e+04  13.541  < 2e-16 ***
X40.TO.42    2.874e+05  2.028e+04  14.168  < 2e-16 ***
X43.TO.45    3.800e+05  4.446e+04   8.546  < 2e-16 ***
X46.TO.48    4.281e+05  4.447e+04   9.626  < 2e-16 ***
X49.TO.51    4.938e+05  5.370e+04   9.196  < 2e-16 ***
PROX_EC     -2.908e+01  9.360e-01 -31.067  < 2e-16 ***
PROX_HA     -4.390e+01  1.203e+00 -36.484  < 2e-16 ***
PROX_MRT    -5.085e+01  1.649e+00 -30.844  < 2e-16 ***
PROX_GDPRI  -1.192e+01  2.221e-01 -53.682  < 2e-16 ***
PROX_MALL   -3.838e+01  1.796e+00 -21.362  < 2e-16 ***
PROX_SUP    -4.347e+01  3.992e+00 -10.889  < 2e-16 ***
NUM_KIN      8.923e+03  6.153e+02  14.502  < 2e-16 ***
NUM_CC      -5.431e+03  3.406e+02 -15.944  < 2e-16 ***
NUM_STOP    -1.937e+03  2.144e+02  -9.032  < 2e-16 ***
NUM_PRI     -1.970e+04  4.545e+02 -43.347  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 73720 on 15823 degrees of freedom
Multiple R-squared:  0.6245,    Adjusted R-squared:  0.6238 
F-statistic: 907.6 on 29 and 15823 DF,  p-value: < 2.2e-16

We can observe that not all the independent variables are statistically significant. We will therefore revise the model by removing those variables which are not statistically significant.

flat_mlr1 <- lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 + X04.TO.06 + X07.TO.09 + X10.TO.12 + X13.TO.15 + X25.TO.27 + X28.TO.30 + X31.TO.33 + X34.TO.36 + X37.TO.39 + X40.TO.42 + X43.TO.45 + X46.TO.48 + X49.TO.51 + PROX_EC + PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL  + PROX_SUP + NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data=flat_resale_sf)

ols_regress(flat_mlr1)
                            Model Summary                              
----------------------------------------------------------------------
R                       0.790       RMSE                    73819.218 
R-Squared               0.623       Coef. Var                  17.026 
Adj. R-Squared          0.623       MSE                5449276926.815 
Pred R-Squared          0.622       MAE                     54122.493 
----------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                    Sum of                                                      
                   Squares           DF       Mean Square       F          Sig. 
--------------------------------------------------------------------------------
Regression    1.427711e+14           26      5.491195e+12    1007.692    0.0000 
Residual      8.624026e+13        15826    5449276926.815                       
Total         2.290113e+14        15852                                         
--------------------------------------------------------------------------------

                                         Parameter Estimates                                          
-----------------------------------------------------------------------------------------------------
      model          Beta    Std. Error    Std. Beta       t        Sig          lower         upper 
-----------------------------------------------------------------------------------------------------
(Intercept)    298078.468     10367.288                  28.752    0.000    277757.402    318399.534 
   AREA_SQM      1873.646        87.177        0.111     21.492    0.000      1702.769      2044.523 
  LEASE_YRS      3374.871        53.307        0.360     63.311    0.000      3270.384      3479.358 
  X01.TO.03    -75555.522      2581.020       -0.239    -29.274    0.000    -80614.616    -70496.428 
  X04.TO.06    -54950.973      2472.752       -0.193    -22.223    0.000    -59797.848    -50104.099 
  X07.TO.09    -39704.471      2489.436       -0.134    -15.949    0.000    -44584.049    -34824.892 
  X10.TO.12    -30753.479      2519.726       -0.099    -12.205    0.000    -35692.430    -25814.529 
  X13.TO.15    -26081.397      2773.183       -0.065     -9.405    0.000    -31517.153    -20645.642 
  X25.TO.27     76572.366      6975.810        0.056     10.977    0.000     62898.983     90245.748 
  X28.TO.30    169785.063      8162.317        0.105     20.801    0.000    153785.992    185784.133 
  X31.TO.33    171884.037     12503.598        0.068     13.747    0.000    147375.561    196392.513 
  X34.TO.36    175258.867     13667.554        0.063     12.823    0.000    148468.904    202048.830 
  X37.TO.39    253764.291     13251.550        0.095     19.150    0.000    227789.744    279738.837 
  X40.TO.42    293472.998     15901.565        0.091     18.456    0.000    262304.120    324641.876 
  X43.TO.45    385882.408     42694.557        0.044      9.038    0.000    302196.213    469568.603 
  X46.TO.48    433981.085     42707.498        0.050     10.162    0.000    350269.524    517692.646 
  X49.TO.51    499701.208     52270.096        0.047      9.560    0.000    397245.866    602156.550 
    PROX_EC       -29.148         0.937       -0.161    -31.099    0.000       -30.985       -27.311 
    PROX_HA       -44.422         1.202       -0.191    -36.946    0.000       -46.778       -42.065 
   PROX_MRT       -50.994         1.651       -0.165    -30.892    0.000       -54.229       -47.758 
 PROX_GDPRI       -11.918         0.222       -0.293    -53.585    0.000       -12.354       -11.482 
  PROX_MALL       -38.886         1.797       -0.124    -21.638    0.000       -42.409       -35.363 
   PROX_SUP       -43.883         3.997       -0.057    -10.979    0.000       -51.718       -36.049 
    NUM_KIN      8892.413       616.123        0.075     14.433    0.000      7684.741     10100.085 
     NUM_CC     -5499.543       340.874       -0.090    -16.134    0.000     -6167.694     -4831.391 
   NUM_STOP     -1912.711       214.677       -0.046     -8.910    0.000     -2333.502     -1491.920 
    NUM_PRI    -19930.768       453.625       -0.254    -43.937    0.000    -20819.924    -19041.612 
-----------------------------------------------------------------------------------------------------

5.1.3 Testing Assumptions of Linear Regression Models

We need to fulfill the 4 following assumptions to perform regression modelling on spatial data:

5.1.3.1 Test for Multi-collinearity

ols_vif_tol() of olsrr package is used to test if there are signs of multi-collinearity.

ols_vif_tol(flat_mlr1)
    Variables Tolerance      VIF
1    AREA_SQM 0.8956868 1.116462
2   LEASE_YRS 0.7340376 1.362328
3   X01.TO.03 0.3569196 2.801751
4   X04.TO.06 0.3167722 3.156843
5   X07.TO.09 0.3371043 2.966441
6   X10.TO.12 0.3584952 2.789438
7   X13.TO.15 0.5009265 1.996301
8   X25.TO.27 0.9175446 1.089865
9   X28.TO.30 0.9346452 1.069925
10  X31.TO.33 0.9704066 1.030496
11  X34.TO.36 0.9742236 1.026458
12  X37.TO.39 0.9717020 1.029122
13  X40.TO.42 0.9809335 1.019437
14  X43.TO.45 0.9966778 1.003333
15  X46.TO.48 0.9960739 1.003942
16  X49.TO.51 0.9973718 1.002635
17    PROX_EC 0.8912374 1.122035
18    PROX_HA 0.8943974 1.118071
19   PROX_MRT 0.8300641 1.204726
20 PROX_GDPRI 0.7980468 1.253059
21  PROX_MALL 0.7244891 1.380283
22   PROX_SUP 0.8695809 1.149979
23    NUM_KIN 0.8852389 1.129639
24     NUM_CC 0.7608978 1.314237
25   NUM_STOP 0.8861773 1.128442
26    NUM_PRI 0.7106962 1.407071

Since the VIF of the independent variables are less than 10, we can safely conclude that there are no signs of multi-collinearity among the independent variables.

5.1.3.2 Test for Non-Linearity

ols_plot_resid_fit() of olsrr package is used to perform the linearity assumption test.

ols_plot_resid_fit(flat_mlr1)

The figure above reveals that most of the data points are scattered around the 0 line, we can therefore safely conclude that the relationships between the dependent variable and independent variables are linear.

5.1.3.3 Test for Normality

ols_plot_resid_hist() of olsrr package is used to perform the normality assumption test.

ols_plot_resid_hist(flat_mlr1)

The figure reveals that the residuals of the multiple linear regression model (i.e. flat_mlr1) resembles normal distribution.

5.1.3.4 Test for Spatial Autocorrelation

The hedonic model that we are building uses geographically referenced attributes, hence it is also important for us to visualise the residuals of the hedonic pricing model.

In order to perform spatial autocorrelation test, we need to convert the flat_resale_sf simple feature object into a SpatialPointsDataFrame.

First, we will export the residuals of the hedonic pricing model and save it as a data frame.

mlr_output <- as.data.frame(flat_mlr1$residuals)

Next, we will join the newly created data frame with the flat_resale_sf object.

flat_resale_res_sf <- cbind(flat_resale_sf,
                            mlr_output) %>%
  rename(`MLR_RES` = `flat_mlr1.residuals`)

Next, we will convert the flat_resale_res_sf simple feature object into a SpatialPointsDataFrame. This is required because the spdep package can only process sp conformed spatial data objects.

flat_resale_sp <- as_Spatial(flat_resale_res_sf)
flat_resale_sp
class       : SpatialPointsDataFrame 
features    : 15853 
extent      : 11597.31, 42623.63, 28217.39, 48741.06  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 34
names       :   PRICE,        LOG_PRICE, AREA_SQM, LEASE_YRS, X01.TO.03, X07.TO.09, X04.TO.06, X10.TO.12, X13.TO.15, X25.TO.27, X19.TO.21, X16.TO.18, X28.TO.30, X31.TO.33, X22.TO.24, ... 
min values  :  218000, 12.2922503417712,       74,      45.5,         0,         0,         0,         0,         0,         0,         0,         0,         0,         0,         0, ... 
max values  : 1186888, 13.9868453136219,      138,        97,         1,         1,         1,         1,         1,         1,         1,         1,         1,         1,         1, ... 

Next, we will use tmap package to visualise the distribution of the residuals.

tmap_mode("view")

tm_shape(mpsz_sf)+
  tm_polygons(alpha = 0.4) +
tm_shape(flat_resale_res_sf) +  
  tm_dots(col = "MLR_RES",
          alpha = 0.6,
          style="quantile") +
  tm_view(set.zoom.limits = c(11,14))

The figure above reveals that there are signs of spatial autocorrelation. To prove that our observation is true, the Moran’s I test will be performed.

First, we will compute the distance-based weight matrix by using dnearneigh() function of spdep.

nb <- dnearneigh(coordinates(flat_resale_sp), 0, 1500, longlat = FALSE)

summary(nb)
Neighbour list object:
Number of regions: 15853 
Number of nonzero links: 10144322 
Percentage nonzero weights: 4.036455 
Average number of links: 639.8992 
Link number distribution:

   2   14   25   31   46   56   58   60   63   65   68   70   71   72 
   3   14   20    6   47    4    4    5    1    2    4    1   12    1 
  78   85   86   88   89   90   91   96   97   99  100  104  106  109 
  77    2    1    5    4    3    1    7    3    7    4    2    2    2 
 110  111  112  115  117  118  119  120  121  122  123  124  125  126 
   6    5    1    6    7    3    1    7   12    7   19    6   58   18 
 128  130  131  132  133  135  136  137  138  139  140  141  142  143 
  29   10   19    4    7    7    4    9   22    5    8    9   13    3 
 144  145  147  148  149  150  151  152  153  154  155  156  157  158 
  13    6   19    1    4    3    6   20   15   14    2   10    4    9 
 159  160  161  164  165  166  167  168  170  171  172  173  174  175 
   3    5   16    9   19   12    6    3    3   15    9   13    1   10 
 176  177  178  179  180  181  182  183  184  185  186  187  188  189 
  16   14   16   27   19    3   10    7   11   29   38    9    6    2 
 190  191  192  193  194  195  196  197  198  199  200  201  202  203 
   7   14    9    4    4    2    3    6   22   12   20   25   13    4 
 204  205  206  207  208  209  211  212  213  214  215  216  217  218 
  12    3    3   21   27    2    3    4    4   10   25    7    1    6 
 219  220  221  222  223  224  225  226  227  228  229  230  231  232 
  10   21   28   17    1   36   10    4   20   22    5   13   12    8 
 233  234  235  236  237  238  239  240  241  242  243  244  245  246 
   6   13   39   19    2   14    7   36   23    5   11   12    5   13 
 247  248  249  250  251  252  253  254  255  256  257  258  259  260 
  19   17   10   14   28   14    8   12    9   16   15    9    4    5 
 261  262  263  264  265  266  267  268  269  270  271  272  273  274 
  12   11   19   12   14   17   15   30    6    7   37   64   16   16 
 275  276  277  278  279  280  281  282  283  284  285  286  287  288 
  17   31   60    3   45   52   24   26   25   26   27   47   49   45 
 289  290  291  292  293  294  295  296  297  298  299  300  301  302 
  25   37   57   27   27   50   18    9   30   21   10   30   38   32 
 303  304  305  306  307  308  309  310  311  312  313  314  315  316 
  14   27   27   17   18    5    2   15   18   10   26   24    8   18 
 317  318  319  320  321  322  323  324  325  326  327  328  329  330 
  44   12   23   40   18   15   30   38   23   28   29   28   18   39 
 331  332  333  334  335  336  337  338  339  340  341  342  343  344 
   5   18   16   14   27    7   62   25   21   31   29    6   33   14 
 345  346  347  348  349  350  351  352  353  354  355  356  357  358 
  12   11   31   42   29   13   14   42   18   17   31   14   25   23 
 359  360  361  362  363  364  365  366  367  368  369  370  371  372 
  31   21   13   30   31    7   28   18   39   16   19   32   22   10 
 373  374  375  376  377  378  379  380  381  382  383  384  385  386 
  45   23   10   61    9   24   13   17   12   18   11   20   29   17 
 387  388  389  390  391  392  393  394  395  396  397  398  399  400 
  47   38   28   22    5   30    5    8   33   20   24   16   28   18 
 401  402  403  404  405  406  407  408  409  410  411  412  413  414 
  22   20   21   24   16    4   14   24   19   14   22   33    5   22 
 415  416  417  418  419  420  421  422  423  424  425  426  427  428 
  22    8   12   76    8   20   22   49   41   18   13   33   28   23 
 429  430  431  432  433  434  435  436  437  438  439  440  441  442 
  20   25   11   16   40   19   32   29   19   22   26   13   19    9 
 443  444  445  446  447  448  449  450  451  452  453  454  455  456 
   6    6   26   10   19   33   15   29    9   32   18   11   25   13 
 457  458  459  460  461  462  463  464  465  466  467  468  469  470 
  72   23   15   10   14   12    9   38   11    5   24   23    3   35 
 471  472  473  474  475  476  477  478  479  480  481  482  483  484 
  13   21   14   14   21   15   24   23   27   26   17    4   12   18 
 485  486  487  488  489  490  491  492  493  494  495  496  497  498 
  14   15    9   23   25   17   10    7   27   16    3    9   20    1 
 499  500  501  502  503  504  505  506  507  508  509  510  511  512 
   5   10   10   18   19    8   36   27   17   16   19   39   12   29 
 513  514  515  516  517  518  519  520  521  522  523  524  525  526 
  12   14   16    8    4   13    2    4   16    7   13   10   12   11 
 527  528  529  530  531  532  533  534  535  536  537  538  539  540 
  20   36   14   17   17    7   17   14   10   22    5    6    9    4 
 541  542  543  544  545  546  548  549  550  551  553  554  555  556 
   4   13   14   18   17   12    7   12   14   18    6   16    8   16 
 557  558  559  560  561  562  563  564  565  566  567  568  569  570 
  14   12   23    5    6    5   19   13   20   14   32   22   14    4 
 571  572  573  574  575  576  577  578  579  580  581  582  583  584 
  22    4   33   27   26   11   22    6   18   14    4    7    8   12 
 585  586  587  588  589  590  591  592  593  594  595  596  597  598 
  12   12   10   30   11   18   26   21    3   18    8   20   11    9 
 599  600  601  602  603  604  605  606  607  608  609  610  611  612 
  16   14   14    3   22    6   18   13    5   10   25   14   22    5 
 613  614  615  616  617  618  619  620  621  622  623  624  625  626 
   2   12   33    3   33   14    5    6   11    6   11   16    3   13 
 627  628  629  630  631  632  633  634  635  636  637  638  639  640 
  23    4    6    7   18    7   13   10   18    7   19    7    5    9 
 641  642  643  644  645  646  647  648  649  650  651  652  653  654 
  15   17    2   10   34    1   17    7   17   24    8   10   10    6 
 655  656  657  658  659  660  661  663  665  667  668  669  670  671 
   5    6    9   23   10    5    5    7    5    6   37   21    6    4 
 672  673  674  675  676  677  678  679  680  681  682  683  685  686 
  12    3   14    7    5    9   14   16    6    7   30   53    8    5 
 687  688  689  690  691  692  693  694  695  696  697  698  699  700 
   5    7   21    5   19    7   10   10   10   16   24    6   20    7 
 701  702  703  704  705  706  707  708  709  710  711  712  713  714 
   4   10   19    7   17    4    8    3    4   30    7    2    8   13 
 715  716  717  718  719  720  721  722  723  725  726  728  729  730 
  34    8   23   10   11    9    4   15    7    2   24    2    6   24 
 731  732  733  734  735  736  737  738  739  740  741  742  743  744 
   6    2    2    6    7    1   11    7    4   13    9    7    1   17 
 745  746  747  748  749  750  751  752  753  754  755  756  757  758 
  15   18   22   29   19   26    5    5    2    1   16    9    4   23 
 759  760  761  762  763  764  765  766  767  768  769  770  771  772 
  10    7   16    3    5   22   29   26    1   21    9   17    3   20 
 773  774  775  776  777  778  779  780  781  782  783  784  785  786 
  13   16   18   21   14    6   34   33   22   11    1    3   15   11 
 787  788  789  790  791  792  793  794  795  796  797  798  799  800 
   3    1    9   16   24   17    4   10    3   56   20   15    9    8 
 802  803  804  805  806  807  808  809  810  811  812  813  814  815 
  29    2   22    7   20   19   30   35   15    7   19    7   10    5 
 816  817  818  819  820  821  822  823  824  825  826  827  828  829 
  27   24   25   21   12    5   11   18   12    2    3   29   14   15 
 830  831  832  833  834  835  836  837  838  839  840  841  842  843 
  12   11    3   31   11   30    9   31    6    5   10   17   10    4 
 844  845  846  847  848  849  850  851  852  853  854  855  856  857 
  25    6   15   17   13   32    9   19    5   34    8    8   12   17 
 859  860  861  862  863  864  865  866  867  868  869  870  871  872 
   1   20   38    7    6    8   20   19    7    2    6    4    2    1 
 873  874  875  876  877  878  879  880  881  882  883  884  885  886 
   4   16   19    9    2    9    1    4    7   14   14   10    6   20 
 888  889  890  891  892  893  894  895  896  897  898  899  900  901 
  19   13    1   17    4    9    7    7    7   13   15    9    7   10 
 902  903  904  905  906  907  908  909  910  911  912  914  915  916 
  10   35    9    6    8    2    8   10   18    2    5    7    3    5 
 917  918  919  920  921  922  923  924  925  926  927  928  929  930 
  32   11   12   10   26    4    5    8    4   22   18    4    2    6 
 932  934  935  936  937  938  940  941  942  943  945  946  947  948 
  10   13    1   30    1    8    2   10    3    2    4    1    1   10 
 949  950  951  952  955  957  959  960  961  963  964  965  966  967 
   4   20    8    6    1    2    7   12    5   11   18    1   16    6 
 968  969  971  972  973  974  975  977  979  980  981  982  984  985 
   4    2   21    2   16    5    4   34   14    6    3    9    6    1 
 986  988  991  992  993  995  996  997  998  999 1000 1001 1003 1005 
   3   21   10    1    8    7   10    1   21   10    2    3    1   16 
1006 1007 1008 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1021 
   9    7    2    4    1    4    1   15   13   11   11   19    6    6 
1022 1023 1024 1025 1026 1027 1028 1029 1031 1032 1033 1034 1035 1037 
   8    9    1   16    7    3    8   18    2   18    6    3   10    3 
1038 1039 1040 1041 1042 1043 1044 1045 1046 1050 1051 1052 1053 1055 
  18   10    9    5   25    2   11    6    9    4    3    5   21    1 
1056 1057 1058 1059 1060 1061 1064 1065 1066 1067 1071 1072 1076 1077 
   7   15    3    8    1    3    4    8    1    7    3    6    9    9 
1079 1081 1082 1085 1086 1087 1088 1089 1091 1092 1094 1095 1097 1099 
   7    1   11    3    5   19    6    3   17    3   11    2    5    5 
1102 1103 1104 1105 1107 1109 1110 1112 1113 1114 1116 1117 1118 1119 
   6   14    5   11    4    2    5   12    3   12    3   13    4    7 
1120 1121 1123 1124 1128 1129 1130 1132 1133 1134 1135 1136 1137 1138 
   3    8    4    4    1   11   10    2    9    8    2   21    5   11 
1139 1140 1142 1143 1144 1145 1146 1147 1148 1149 1150 1152 1153 1154 
  13   21    7    8    4   14    4    5    3    6    3    8    9    7 
1155 1156 1157 1158 1159 1160 1161 1162 1164 1165 1168 1169 1170 1172 
   2    1   32    1    9    5   15    4   14    6   21   19   17    7 
1173 1174 1175 1176 1177 1178 1180 1182 1183 1185 1186 1187 1188 1191 
  28   15    2    5    4    7    2   19    8    8   11   20    6    7 
1194 1195 1196 1197 1198 1200 1201 1203 1204 1205 1206 1207 1209 1210 
   6    7    8   13   16    9   10    4    4   13    1   14   11    3 
1214 1215 1216 1217 1218 1219 1220 1221 1222 1224 1225 1226 1227 1228 
  10    6   16    7   31   40    9   22   16   13    7    7   11    4 
1230 1231 1232 1234 1235 1237 1238 1239 1240 1241 1242 1243 1245 1246 
  13   10   10   32    2   25    3    1    8    4   11    8    1    2 
1249 1250 1251 1254 1256 1257 1259 1260 1261 1263 1264 1265 1266 1267 
   1    5    7    4    6    2    2   10    1    3    1    2   19   12 
1268 1269 1270 1271 1274 1277 1278 1279 1280 1281 1282 1283 1284 1285 
   7    6    3    3    1    7   16   11    7    1   13    1    1   10 
1286 1287 1289 1290 1291 1292 1296 1297 1298 1300 1301 1302 1303 1304 
   9    2    5    5    4    7    7    3    6    5    3    6    4   22 
1305 1306 1307 1308 1309 1310 1311 1313 1315 1318 1319 1320 1321 1322 
  12   12    5    2    6   21    3    1   17    1   21    3    3    6 
1324 1326 1328 1329 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 
  12    1    9    1   10   11    6    9   23   13    3    1    3    9 
1341 1342 1343 1344 1346 1347 1349 1351 1352 1355 1356 1357 1358 1362 
  13   22   18    5    1   22    6    6    3    8   23    6    2   27 
1363 1364 1365 1367 1368 1370 1372 1373 1375 1376 1377 1378 1379 1380 
   5    2    1    7   12   12    7    6   11    4   11    6   19    3 
1381 1383 1385 1387 1388 1390 1391 1392 1393 1395 1396 1397 1399 1400 
   2   16    1    7   10   24    5    3    7    5   17    6    5    9 
1401 1402 1403 1404 1405 1408 1409 1412 1414 1417 1420 1422 1423 1427 
   9    9   11    3    1    6    5    5    5    3    3    3    2    1 
1428 1429 1431 1434 1435 1439 1441 1442 1443 1446 1450 1451 1452 1453 
   1    1   11    1    1    2    2    4   17    6    2    1    1    8 
1454 1458 1463 1464 1465 1466 1467 1469 1470 1473 1478 1479 1480 1483 
  11    3    1    3    2    5    2    1    7    2    1    3    9    1 
1484 1486 1487 1488 1491 1492 1495 1496 1498 1509 1513 1517 1520 1526 
   2    7    5    1    7    2    1   20    1    2    3    3    2    6 
1530 1531 1533 1537 1538 1539 1542 1546 1547 1552 1555 1557 1559 1563 
   4   12    1    3    2    3    1    2    7    4    2    3    2    8 
1564 1567 1571 1577 1578 1582 1583 1587 1592 1594 1603 1608 1612 1614 
   7    1    2    4    2    8    3    3    6    2    9    1    1    2 
1616 1620 1628 1630 1638 1644 1645 1669 1678 1687 1691 1700 1718 1728 
   2    2    2   16    1    3    4    1    1    1    1    5    1    1 
1731 1735 1746 1748 1759 1762 1767 1768 1793 1799 1802 1817 1845 1868 
   3    5    3    2    3    4    4    1    4    5    4    1    3    3 
1875 1913 1942 1956 
   1    2    5    4 
3 least connected regions:
3083 4737 5544 with 2 links
4 most connected regions:
3250 8144 12521 13498 with 1956 links

Next, nb2listw() of spdep packge will be used to convert the output neighbours lists (i.e. nb) into a spatial weights.

nb_lw <- nb2listw(nb, style = 'W')

summary(nb_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 15853 
Number of nonzero links: 10144322 
Percentage nonzero weights: 4.036455 
Average number of links: 639.8992 
Link number distribution:

   2   14   25   31   46   56   58   60   63   65   68   70   71   72 
   3   14   20    6   47    4    4    5    1    2    4    1   12    1 
  78   85   86   88   89   90   91   96   97   99  100  104  106  109 
  77    2    1    5    4    3    1    7    3    7    4    2    2    2 
 110  111  112  115  117  118  119  120  121  122  123  124  125  126 
   6    5    1    6    7    3    1    7   12    7   19    6   58   18 
 128  130  131  132  133  135  136  137  138  139  140  141  142  143 
  29   10   19    4    7    7    4    9   22    5    8    9   13    3 
 144  145  147  148  149  150  151  152  153  154  155  156  157  158 
  13    6   19    1    4    3    6   20   15   14    2   10    4    9 
 159  160  161  164  165  166  167  168  170  171  172  173  174  175 
   3    5   16    9   19   12    6    3    3   15    9   13    1   10 
 176  177  178  179  180  181  182  183  184  185  186  187  188  189 
  16   14   16   27   19    3   10    7   11   29   38    9    6    2 
 190  191  192  193  194  195  196  197  198  199  200  201  202  203 
   7   14    9    4    4    2    3    6   22   12   20   25   13    4 
 204  205  206  207  208  209  211  212  213  214  215  216  217  218 
  12    3    3   21   27    2    3    4    4   10   25    7    1    6 
 219  220  221  222  223  224  225  226  227  228  229  230  231  232 
  10   21   28   17    1   36   10    4   20   22    5   13   12    8 
 233  234  235  236  237  238  239  240  241  242  243  244  245  246 
   6   13   39   19    2   14    7   36   23    5   11   12    5   13 
 247  248  249  250  251  252  253  254  255  256  257  258  259  260 
  19   17   10   14   28   14    8   12    9   16   15    9    4    5 
 261  262  263  264  265  266  267  268  269  270  271  272  273  274 
  12   11   19   12   14   17   15   30    6    7   37   64   16   16 
 275  276  277  278  279  280  281  282  283  284  285  286  287  288 
  17   31   60    3   45   52   24   26   25   26   27   47   49   45 
 289  290  291  292  293  294  295  296  297  298  299  300  301  302 
  25   37   57   27   27   50   18    9   30   21   10   30   38   32 
 303  304  305  306  307  308  309  310  311  312  313  314  315  316 
  14   27   27   17   18    5    2   15   18   10   26   24    8   18 
 317  318  319  320  321  322  323  324  325  326  327  328  329  330 
  44   12   23   40   18   15   30   38   23   28   29   28   18   39 
 331  332  333  334  335  336  337  338  339  340  341  342  343  344 
   5   18   16   14   27    7   62   25   21   31   29    6   33   14 
 345  346  347  348  349  350  351  352  353  354  355  356  357  358 
  12   11   31   42   29   13   14   42   18   17   31   14   25   23 
 359  360  361  362  363  364  365  366  367  368  369  370  371  372 
  31   21   13   30   31    7   28   18   39   16   19   32   22   10 
 373  374  375  376  377  378  379  380  381  382  383  384  385  386 
  45   23   10   61    9   24   13   17   12   18   11   20   29   17 
 387  388  389  390  391  392  393  394  395  396  397  398  399  400 
  47   38   28   22    5   30    5    8   33   20   24   16   28   18 
 401  402  403  404  405  406  407  408  409  410  411  412  413  414 
  22   20   21   24   16    4   14   24   19   14   22   33    5   22 
 415  416  417  418  419  420  421  422  423  424  425  426  427  428 
  22    8   12   76    8   20   22   49   41   18   13   33   28   23 
 429  430  431  432  433  434  435  436  437  438  439  440  441  442 
  20   25   11   16   40   19   32   29   19   22   26   13   19    9 
 443  444  445  446  447  448  449  450  451  452  453  454  455  456 
   6    6   26   10   19   33   15   29    9   32   18   11   25   13 
 457  458  459  460  461  462  463  464  465  466  467  468  469  470 
  72   23   15   10   14   12    9   38   11    5   24   23    3   35 
 471  472  473  474  475  476  477  478  479  480  481  482  483  484 
  13   21   14   14   21   15   24   23   27   26   17    4   12   18 
 485  486  487  488  489  490  491  492  493  494  495  496  497  498 
  14   15    9   23   25   17   10    7   27   16    3    9   20    1 
 499  500  501  502  503  504  505  506  507  508  509  510  511  512 
   5   10   10   18   19    8   36   27   17   16   19   39   12   29 
 513  514  515  516  517  518  519  520  521  522  523  524  525  526 
  12   14   16    8    4   13    2    4   16    7   13   10   12   11 
 527  528  529  530  531  532  533  534  535  536  537  538  539  540 
  20   36   14   17   17    7   17   14   10   22    5    6    9    4 
 541  542  543  544  545  546  548  549  550  551  553  554  555  556 
   4   13   14   18   17   12    7   12   14   18    6   16    8   16 
 557  558  559  560  561  562  563  564  565  566  567  568  569  570 
  14   12   23    5    6    5   19   13   20   14   32   22   14    4 
 571  572  573  574  575  576  577  578  579  580  581  582  583  584 
  22    4   33   27   26   11   22    6   18   14    4    7    8   12 
 585  586  587  588  589  590  591  592  593  594  595  596  597  598 
  12   12   10   30   11   18   26   21    3   18    8   20   11    9 
 599  600  601  602  603  604  605  606  607  608  609  610  611  612 
  16   14   14    3   22    6   18   13    5   10   25   14   22    5 
 613  614  615  616  617  618  619  620  621  622  623  624  625  626 
   2   12   33    3   33   14    5    6   11    6   11   16    3   13 
 627  628  629  630  631  632  633  634  635  636  637  638  639  640 
  23    4    6    7   18    7   13   10   18    7   19    7    5    9 
 641  642  643  644  645  646  647  648  649  650  651  652  653  654 
  15   17    2   10   34    1   17    7   17   24    8   10   10    6 
 655  656  657  658  659  660  661  663  665  667  668  669  670  671 
   5    6    9   23   10    5    5    7    5    6   37   21    6    4 
 672  673  674  675  676  677  678  679  680  681  682  683  685  686 
  12    3   14    7    5    9   14   16    6    7   30   53    8    5 
 687  688  689  690  691  692  693  694  695  696  697  698  699  700 
   5    7   21    5   19    7   10   10   10   16   24    6   20    7 
 701  702  703  704  705  706  707  708  709  710  711  712  713  714 
   4   10   19    7   17    4    8    3    4   30    7    2    8   13 
 715  716  717  718  719  720  721  722  723  725  726  728  729  730 
  34    8   23   10   11    9    4   15    7    2   24    2    6   24 
 731  732  733  734  735  736  737  738  739  740  741  742  743  744 
   6    2    2    6    7    1   11    7    4   13    9    7    1   17 
 745  746  747  748  749  750  751  752  753  754  755  756  757  758 
  15   18   22   29   19   26    5    5    2    1   16    9    4   23 
 759  760  761  762  763  764  765  766  767  768  769  770  771  772 
  10    7   16    3    5   22   29   26    1   21    9   17    3   20 
 773  774  775  776  777  778  779  780  781  782  783  784  785  786 
  13   16   18   21   14    6   34   33   22   11    1    3   15   11 
 787  788  789  790  791  792  793  794  795  796  797  798  799  800 
   3    1    9   16   24   17    4   10    3   56   20   15    9    8 
 802  803  804  805  806  807  808  809  810  811  812  813  814  815 
  29    2   22    7   20   19   30   35   15    7   19    7   10    5 
 816  817  818  819  820  821  822  823  824  825  826  827  828  829 
  27   24   25   21   12    5   11   18   12    2    3   29   14   15 
 830  831  832  833  834  835  836  837  838  839  840  841  842  843 
  12   11    3   31   11   30    9   31    6    5   10   17   10    4 
 844  845  846  847  848  849  850  851  852  853  854  855  856  857 
  25    6   15   17   13   32    9   19    5   34    8    8   12   17 
 859  860  861  862  863  864  865  866  867  868  869  870  871  872 
   1   20   38    7    6    8   20   19    7    2    6    4    2    1 
 873  874  875  876  877  878  879  880  881  882  883  884  885  886 
   4   16   19    9    2    9    1    4    7   14   14   10    6   20 
 888  889  890  891  892  893  894  895  896  897  898  899  900  901 
  19   13    1   17    4    9    7    7    7   13   15    9    7   10 
 902  903  904  905  906  907  908  909  910  911  912  914  915  916 
  10   35    9    6    8    2    8   10   18    2    5    7    3    5 
 917  918  919  920  921  922  923  924  925  926  927  928  929  930 
  32   11   12   10   26    4    5    8    4   22   18    4    2    6 
 932  934  935  936  937  938  940  941  942  943  945  946  947  948 
  10   13    1   30    1    8    2   10    3    2    4    1    1   10 
 949  950  951  952  955  957  959  960  961  963  964  965  966  967 
   4   20    8    6    1    2    7   12    5   11   18    1   16    6 
 968  969  971  972  973  974  975  977  979  980  981  982  984  985 
   4    2   21    2   16    5    4   34   14    6    3    9    6    1 
 986  988  991  992  993  995  996  997  998  999 1000 1001 1003 1005 
   3   21   10    1    8    7   10    1   21   10    2    3    1   16 
1006 1007 1008 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1021 
   9    7    2    4    1    4    1   15   13   11   11   19    6    6 
1022 1023 1024 1025 1026 1027 1028 1029 1031 1032 1033 1034 1035 1037 
   8    9    1   16    7    3    8   18    2   18    6    3   10    3 
1038 1039 1040 1041 1042 1043 1044 1045 1046 1050 1051 1052 1053 1055 
  18   10    9    5   25    2   11    6    9    4    3    5   21    1 
1056 1057 1058 1059 1060 1061 1064 1065 1066 1067 1071 1072 1076 1077 
   7   15    3    8    1    3    4    8    1    7    3    6    9    9 
1079 1081 1082 1085 1086 1087 1088 1089 1091 1092 1094 1095 1097 1099 
   7    1   11    3    5   19    6    3   17    3   11    2    5    5 
1102 1103 1104 1105 1107 1109 1110 1112 1113 1114 1116 1117 1118 1119 
   6   14    5   11    4    2    5   12    3   12    3   13    4    7 
1120 1121 1123 1124 1128 1129 1130 1132 1133 1134 1135 1136 1137 1138 
   3    8    4    4    1   11   10    2    9    8    2   21    5   11 
1139 1140 1142 1143 1144 1145 1146 1147 1148 1149 1150 1152 1153 1154 
  13   21    7    8    4   14    4    5    3    6    3    8    9    7 
1155 1156 1157 1158 1159 1160 1161 1162 1164 1165 1168 1169 1170 1172 
   2    1   32    1    9    5   15    4   14    6   21   19   17    7 
1173 1174 1175 1176 1177 1178 1180 1182 1183 1185 1186 1187 1188 1191 
  28   15    2    5    4    7    2   19    8    8   11   20    6    7 
1194 1195 1196 1197 1198 1200 1201 1203 1204 1205 1206 1207 1209 1210 
   6    7    8   13   16    9   10    4    4   13    1   14   11    3 
1214 1215 1216 1217 1218 1219 1220 1221 1222 1224 1225 1226 1227 1228 
  10    6   16    7   31   40    9   22   16   13    7    7   11    4 
1230 1231 1232 1234 1235 1237 1238 1239 1240 1241 1242 1243 1245 1246 
  13   10   10   32    2   25    3    1    8    4   11    8    1    2 
1249 1250 1251 1254 1256 1257 1259 1260 1261 1263 1264 1265 1266 1267 
   1    5    7    4    6    2    2   10    1    3    1    2   19   12 
1268 1269 1270 1271 1274 1277 1278 1279 1280 1281 1282 1283 1284 1285 
   7    6    3    3    1    7   16   11    7    1   13    1    1   10 
1286 1287 1289 1290 1291 1292 1296 1297 1298 1300 1301 1302 1303 1304 
   9    2    5    5    4    7    7    3    6    5    3    6    4   22 
1305 1306 1307 1308 1309 1310 1311 1313 1315 1318 1319 1320 1321 1322 
  12   12    5    2    6   21    3    1   17    1   21    3    3    6 
1324 1326 1328 1329 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 
  12    1    9    1   10   11    6    9   23   13    3    1    3    9 
1341 1342 1343 1344 1346 1347 1349 1351 1352 1355 1356 1357 1358 1362 
  13   22   18    5    1   22    6    6    3    8   23    6    2   27 
1363 1364 1365 1367 1368 1370 1372 1373 1375 1376 1377 1378 1379 1380 
   5    2    1    7   12   12    7    6   11    4   11    6   19    3 
1381 1383 1385 1387 1388 1390 1391 1392 1393 1395 1396 1397 1399 1400 
   2   16    1    7   10   24    5    3    7    5   17    6    5    9 
1401 1402 1403 1404 1405 1408 1409 1412 1414 1417 1420 1422 1423 1427 
   9    9   11    3    1    6    5    5    5    3    3    3    2    1 
1428 1429 1431 1434 1435 1439 1441 1442 1443 1446 1450 1451 1452 1453 
   1    1   11    1    1    2    2    4   17    6    2    1    1    8 
1454 1458 1463 1464 1465 1466 1467 1469 1470 1473 1478 1479 1480 1483 
  11    3    1    3    2    5    2    1    7    2    1    3    9    1 
1484 1486 1487 1488 1491 1492 1495 1496 1498 1509 1513 1517 1520 1526 
   2    7    5    1    7    2    1   20    1    2    3    3    2    6 
1530 1531 1533 1537 1538 1539 1542 1546 1547 1552 1555 1557 1559 1563 
   4   12    1    3    2    3    1    2    7    4    2    3    2    8 
1564 1567 1571 1577 1578 1582 1583 1587 1592 1594 1603 1608 1612 1614 
   7    1    2    4    2    8    3    3    6    2    9    1    1    2 
1616 1620 1628 1630 1638 1644 1645 1669 1678 1687 1691 1700 1718 1728 
   2    2    2   16    1    3    4    1    1    1    1    5    1    1 
1731 1735 1746 1748 1759 1762 1767 1768 1793 1799 1802 1817 1845 1868 
   3    5    3    2    3    4    4    1    4    5    4    1    3    3 
1875 1913 1942 1956 
   1    2    5    4 
3 least connected regions:
3083 4737 5544 with 2 links
4 most connected regions:
3250 8144 12521 13498 with 1956 links

Weights style: W 
Weights constants summary:
      n        nn    S0       S1       S2
W 15853 251317609 15853 79.46042 63911.22

Next, lm.morantest() of spdep package will be used to perform Moran’s I test for residual spatial autocorrelation

lm.morantest(flat_mlr1, nb_lw)

    Global Moran I for regression residuals

data:  
model: lm(formula = PRICE ~ AREA_SQM + LEASE_YRS + X01.TO.03 +
X04.TO.06 + X07.TO.09 + X10.TO.12 + X13.TO.15 + X25.TO.27 +
X28.TO.30 + X31.TO.33 + X34.TO.36 + X37.TO.39 + X40.TO.42 +
X43.TO.45 + X46.TO.48 + X49.TO.51 + PROX_EC + PROX_HA +
PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP + NUM_KIN +
NUM_CC + NUM_STOP + NUM_PRI, data = flat_resale_sf)
weights: nb_lw

Moran I statistic standard deviate = 921.57, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    4.713397e-01    -3.669543e-04     2.619936e-07 

The Global Moran’s I test for residual spatial autocorrelation shows that its p-value is less than the alpha value of 0.05. We will thus reject the null hypothesis that the residuals are randomly distributed.

Since the Observed Global Moran I = 0.471 which is greater than 0, we can infer than the residuals resemble cluster distribution.

5.2 GWR Model Method

We will calibrate the GWR-based hedonic pricing model using the adaptive bandwidth approach.

5.2.1 Computing the adaptive bandwidth

We first use bw.gwr() to determine the recommended data point to use.

bw_adaptive <- bw.gwr(formula = PRICE ~ AREA_SQM + LEASE_YRS + PROX_EC +
                      PROX_HA + PROX_MRT  + PROX_GDPRI + PROX_MALL + PROX_SUP +
                      NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI,
                      data=flat_resale_sp, approach="CV", kernel="gaussian",
                      adaptive=TRUE, longlat=FALSE)
Take a cup of tea and have a break, it will take a few minutes.
          -----A kind suggestion from GWmodel development group
Adaptive bandwidth: 9805 CV score: 9.263168e+13 
Adaptive bandwidth: 6068 CV score: 8.315779e+13 
Adaptive bandwidth: 3757 CV score: 7.260491e+13 
Adaptive bandwidth: 2330 CV score: 6.088141e+13 
Adaptive bandwidth: 1446 CV score: 4.445911e+13 
Adaptive bandwidth: 902 CV score: 3.45399e+13 
Adaptive bandwidth: 563 CV score: 2.884093e+13 
Adaptive bandwidth: 356 CV score: 2.453133e+13 
Adaptive bandwidth: 225 CV score: 2.114838e+13 
Adaptive bandwidth: 147 CV score: 1.881623e+13 
Adaptive bandwidth: 96 CV score: 1.701977e+13 
Adaptive bandwidth: 67 CV score: 1.596888e+13 
Adaptive bandwidth: 46 CV score: Inf 
Adaptive bandwidth: 76 CV score: 1.623633e+13 
Adaptive bandwidth: 57 CV score: 1.564312e+13 
Adaptive bandwidth: 55 CV score: 1.558171e+13 
Adaptive bandwidth: 49 CV score: Inf 
Adaptive bandwidth: 53 CV score: 1.543224e+13 
Adaptive bandwidth: 58 CV score: 1.56816e+13 
Adaptive bandwidth: 56 CV score: 1.562018e+13 
Adaptive bandwidth: 57 CV score: 1.564312e+13 
Adaptive bandwidth: 56 CV score: 1.562018e+13 
Adaptive bandwidth: 56 CV score: 1.562018e+13 
Adaptive bandwidth: 55 CV score: 1.558171e+13 
Adaptive bandwidth: 55 CV score: 1.558171e+13 
Adaptive bandwidth: 54 CV score: 1.553114e+13 
Adaptive bandwidth: 54 CV score: 1.553114e+13 
Adaptive bandwidth: 53 CV score: 1.543224e+13 

The result shows that the recommended data point is 56. We will then pass this value in while computing the adaptive bandwidth GWR model.

5.2.2 Constructing Adaptive Bandwidth GWR Model

We can now go ahead to calibrate the GWR-based hedonic pricing model using the adaptive bandwidth and gaussian kernel.

gwr_adaptive <- gwr.basic(formula = PRICE ~ AREA_SQM + LEASE_YRS + PROX_EC +
                      PROX_HA + PROX_MRT  + PROX_GDPRI + PROX_MALL + PROX_SUP +
                      NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI,
                      data=flat_resale_sp, bw=bw_adaptive, 
                      kernel = 'gaussian', adaptive=TRUE, longlat = FALSE)

gwr_adaptive
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2021-11-06 16:09:02 
   Call:
   gwr.basic(formula = PRICE ~ AREA_SQM + LEASE_YRS + PROX_EC + 
    PROX_HA + PROX_MRT + PROX_GDPRI + PROX_MALL + PROX_SUP + 
    NUM_KIN + NUM_CC + NUM_STOP + NUM_PRI, data = flat_resale_sp, 
    bw = bw_adaptive, kernel = "gaussian", adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  PRICE
   Independent variables:  AREA_SQM LEASE_YRS PROX_EC PROX_HA PROX_MRT PROX_GDPRI PROX_MALL PROX_SUP NUM_KIN NUM_CC NUM_STOP NUM_PRI
   Number of data points: 15853
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
    Min      1Q  Median      3Q     Max 
-239991  -48552   -9613   39391  603257 

   Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
   (Intercept)  2.742e+05  1.127e+04  24.334   <2e-16 ***
   AREA_SQM     1.466e+03  9.613e+01  15.248   <2e-16 ***
   LEASE_YRS    4.105e+03  5.623e+01  73.004   <2e-16 ***
   PROX_EC     -3.248e+01  1.036e+00 -31.362   <2e-16 ***
   PROX_HA     -5.108e+01  1.326e+00 -38.528   <2e-16 ***
   PROX_MRT    -5.963e+01  1.815e+00 -32.848   <2e-16 ***
   PROX_GDPRI  -1.254e+01  2.455e-01 -51.077   <2e-16 ***
   PROX_MALL   -4.594e+01  1.984e+00 -23.151   <2e-16 ***
   PROX_SUP    -3.998e+01  4.426e+00  -9.031   <2e-16 ***
   NUM_KIN      1.159e+04  6.791e+02  17.060   <2e-16 ***
   NUM_CC      -5.590e+03  3.771e+02 -14.822   <2e-16 ***
   NUM_STOP    -2.153e+03  2.377e+02  -9.060   <2e-16 ***
   NUM_PRI     -2.448e+04  4.921e+02 -49.736   <2e-16 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 81800 on 15840 degrees of freedom
   Multiple R-squared: 0.5372
   Adjusted R-squared: 0.5369 
   F-statistic:  1532 on 12 and 15840 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 1.059802e+14
   Sigma(hat): 81768.11
   AIC:  403661.8
   AICc:  403661.8
   BIC:  388051.6
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 53 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                     Min.     1st Qu.      Median     3rd Qu.
   Intercept  -1.0236e+08 -3.7350e+05 -1.1166e+05  1.7093e+05
   AREA_SQM   -1.1656e+04  1.4608e+03  2.2916e+03  3.8652e+03
   LEASE_YRS  -2.1344e+04  3.2116e+03  5.1528e+03  7.2816e+03
   PROX_EC    -1.7900e+04 -5.3473e+01  6.0407e+00  5.2218e+01
   PROX_HA    -9.4910e+04 -4.7023e+01  2.5105e+00  7.1890e+01
   PROX_MRT   -7.0885e+04 -1.2639e+02 -5.7240e+01  3.7897e+00
   PROX_GDPRI -1.5787e+04 -6.0603e+01 -6.7086e+00  4.4177e+01
   PROX_MALL  -7.0324e+04 -9.0740e+01 -2.4896e+01  4.3880e+01
   PROX_SUP   -3.5381e+03 -5.8118e+01 -1.1337e+01  4.6861e+01
   NUM_KIN    -5.0136e+06 -7.4942e+03 -6.6209e+02  4.8968e+03
   NUM_CC     -1.2328e+05 -2.4903e+03  2.3985e+02  3.2078e+03
   NUM_STOP   -3.5763e+04 -1.9891e+03  4.1286e+01  2.1538e+03
   NUM_PRI    -6.5536e+05 -8.1013e+03 -1.2313e+03  4.2885e+03
                   Max.
   Intercept  129458402
   AREA_SQM      224871
   LEASE_YRS      32079
   PROX_EC        77518
   PROX_HA        81068
   PROX_MRT       33093
   PROX_GDPRI     18186
   PROX_MALL      45254
   PROX_SUP       45269
   NUM_KIN       257280
   NUM_CC        201473
   NUM_STOP      124415
   NUM_PRI       303858
   ************************Diagnostic information*************************
   Number of data points: 15853 
   Effective number of parameters (2trace(S) - trace(S'S)): 1673.642 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 14179.36 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 372896.5 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 371285.5 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 367182.2 
   Residual sum of squares: 1.264499e+13 
   R-square value:  0.9447844 
   Adjusted R-square value:  0.9382667 

   ***********************************************************************
   Program stops at: 2021-11-06 16:12:22 

The report shows that the adjusted r-square of the GWR is 0.9382667 which is significantly better than the global multiple linear regression model of 0.5369.

5.2.3 Visualising GWR Output

In addition to regression residuals, the output feature class table includes:

They are all stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object integrated with fit.points, GWR coefficient estimates, y value, predicted values, coefficient standard errors and t-values in its “data” slot in an object called SDF of the output list.

5.2.3.1 Converting SDF into sf data frame

To visualise the fields in SDF, we need to first covert it into sf data frame.

flat_resale_sf_adaptive <- st_as_sf(gwr_adaptive$SDF) %>%
  st_transform(crs=3414)
gwr_adaptive_output <- as.data.frame(gwr_adaptive$SDF)

flat_resale_sf_adaptive <- cbind(flat_resale_res_sf,
                                  as.matrix(gwr_adaptive_output))

glimpse(flat_resale_sf_adaptive)
Rows: 15,853
Columns: 82
$ PRICE         <dbl> 330000, 360000, 370000, 375000, 380000, 380000~
$ LOG_PRICE     <dbl> 12.70685, 12.79386, 12.82126, 12.83468, 12.847~
$ AREA_SQM      <dbl> 92, 91, 92, 99, 92, 92, 92, 92, 93, 91, 91, 98~
$ LEASE_YRS     <dbl> 57.00, 61.50, 61.08, 58.33, 59.58, 61.00, 58.8~
$ X01.TO.03     <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X07.TO.09     <dbl> 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0~
$ X04.TO.06     <dbl> 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0~
$ X10.TO.12     <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0~
$ X13.TO.15     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1~
$ X25.TO.27     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X19.TO.21     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X16.TO.18     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X28.TO.30     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X31.TO.33     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X22.TO.24     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X37.TO.39     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X34.TO.36     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X40.TO.42     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X43.TO.45     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X46.TO.48     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ X49.TO.51     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ PROX_CBD      <dbl> 8824.749, 9841.309, 9560.780, 9609.731, 8351.0~
$ PROX_EC       <dbl> 251.4065, 631.8448, 1082.4168, 347.3195, 195.8~
$ PROX_HA       <dbl> 441.82653, 269.72560, 258.29513, 436.47518, 70~
$ PROX_MRT      <dbl> 703.9715, 1096.9096, 889.9529, 1409.3169, 886.~
$ PROX_PARK     <dbl> 730.3122, 645.3451, 743.7891, 808.8585, 951.45~
$ PROX_GDPRI    <dbl> 1426.311, 1735.175, 2069.003, 1431.997, 2409.0~
$ PROX_MALL     <dbl> 546.0414, 1452.7304, 1030.7976, 1514.3816, 994~
$ PROX_SUP      <dbl> 270.8222, 310.1889, 318.7560, 458.6748, 337.98~
$ NUM_KIN       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 2~
$ NUM_CC        <dbl> 6, 5, 2, 3, 3, 2, 3, 4, 3, 2, 4, 4, 4, 5, 2, 7~
$ NUM_STOP      <dbl> 8, 8, 8, 7, 6, 9, 6, 6, 5, 4, 10, 5, 6, 9, 8, ~
$ NUM_PRI       <dbl> 2, 2, 1, 2, 2, 1, 3, 2, 2, 2, 2, 2, 3, 2, 2, 3~
$ MLR_RES       <dbl> -65028.840, -25545.406, -26560.907, 7843.605, ~
$ Intercept     <dbl> -350761.80, -55327.20, -227355.59, -46943.06, ~
$ AREA_SQM.1    <dbl> 3865.1772, 1737.2207, 2625.1875, 1691.6949, 40~
$ LEASE_YRS.1   <dbl> 6611.712, 6424.919, 10006.102, 6298.735, 8284.~
$ PROX_EC.1     <dbl> -43.8754894, -2.8562872, -61.0260887, -0.88612~
$ PROX_HA.1     <dbl> -36.3592938, 37.0574321, -36.5263396, 87.79895~
$ PROX_MRT.1    <dbl> -168.34579, -16.77467, -90.13781, -41.06379, -~
$ PROX_GDPRI.1  <dbl> 34.898286, -27.915369, -24.089908, -44.094387,~
$ PROX_MALL.1   <dbl> 115.915461, -47.616086, -18.726036, -24.432638~
$ PROX_SUP.1    <dbl> 73.043248, 89.510041, -41.240030, 60.707395, -~
$ NUM_KIN.1     <dbl> 1575.8238, 12980.2395, 2033.8891, 9039.6359, -~
$ NUM_CC.1      <dbl> -2088.0613, 2556.1607, -2730.4295, 2805.3276, ~
$ NUM_STOP.1    <dbl> 7683.8487, -308.4065, 2569.5484, 633.8828, 549~
$ NUM_PRI.1     <dbl> -6256.462, -21381.637, -20925.171, -14726.568,~
$ y             <dbl> 330000, 360000, 370000, 375000, 380000, 380000~
$ yhat          <dbl> 406953.8, 378366.9, 383539.6, 388223.6, 373838~
$ residual      <dbl> -76953.8297, -18366.8767, -13539.6453, -13223.~
$ CV_Score      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ Stud_residual <dbl> -2.79850481, -0.63629402, -0.47335355, -0.4649~
$ Intercept_SE  <dbl> 86015.65, 57512.72, 134655.97, 55066.51, 15457~
$ AREA_SQM_SE   <dbl> 626.1058, 488.0151, 1123.0209, 454.2593, 1204.~
$ LEASE_YRS_SE  <dbl> 215.2578, 252.7942, 295.7688, 208.5840, 367.59~
$ PROX_EC_SE    <dbl> 41.16518, 17.02008, 19.19507, 14.00328, 23.571~
$ PROX_HA_SE    <dbl> 33.139552, 23.558663, 37.946592, 21.782462, 38~
$ PROX_MRT_SE   <dbl> 72.61696, 26.28474, 29.59957, 26.01910, 42.299~
$ PROX_GDPRI_SE <dbl> 30.225445, 17.937704, 18.894105, 16.530167, 22~
$ PROX_MALL_SE  <dbl> 71.316385, 18.273429, 29.884950, 19.572021, 43~
$ PROX_SUP_SE   <dbl> 46.57929, 27.54419, 29.04538, 25.79186, 29.471~
$ NUM_KIN_SE    <dbl> 6312.955, 5186.056, 7882.663, 4705.616, 6429.2~
$ NUM_CC_SE     <dbl> 3376.605, 1758.835, 3540.494, 1769.266, 4205.2~
$ NUM_STOP_SE   <dbl> 1906.4874, 1646.1514, 1989.5683, 1570.5296, 27~
$ NUM_PRI_SE    <dbl> 7858.392, 6023.790, 7674.683, 5301.985, 7121.3~
$ Intercept_TV  <dbl> -4.0778837, -0.9619993, -1.6884182, -0.8524793~
$ AREA_SQM_TV   <dbl> 6.1733609, 3.5597682, 2.3376124, 3.7240730, 3.~
$ LEASE_YRS_TV  <dbl> 30.715324, 25.415609, 33.830822, 30.197599, 22~
$ PROX_EC_TV    <dbl> -1.06583991, -0.16781867, -3.17925848, -0.0632~
$ PROX_HA_TV    <dbl> -1.09715707, 1.57298537, -0.96257233, 4.030717~
$ PROX_MRT_TV   <dbl> -2.3182709, -0.6381906, -3.0452410, -1.5782171~
$ PROX_GDPRI_TV <dbl> 1.1545996, -1.5562398, -1.2749960, -2.6675100,~
$ PROX_MALL_TV  <dbl> 1.6253693, -2.6057553, -0.6266042, -1.2483452,~
$ PROX_SUP_TV   <dbl> 1.56814872, 3.24968852, -1.41984837, 2.3537422~
$ NUM_KIN_TV    <dbl> 0.24961745, 2.50291168, 0.25802055, 1.92103155~
$ NUM_CC_TV     <dbl> -0.6183908, 1.4533264, -0.7712001, 1.5855880, ~
$ NUM_STOP_TV   <dbl> 4.0303694, -0.1873500, 1.2915105, 0.4036109, 1~
$ NUM_PRI_TV    <dbl> -0.7961504, -3.5495321, -2.7265193, -2.7775575~
$ Local_R2      <dbl> 0.8533566, 0.7946223, 0.9666896, 0.8001496, 0.~
$ coords.x1     <dbl> 29179.92, 28423.42, 30550.38, 28240.06, 30443.~
$ coords.x2     <dbl> 38822.08, 39745.94, 39588.77, 39477.60, 38382.~
$ geometry      <POINT [m]> POINT (29179.92 38822.08), POINT (28423.~

5.2.3.2 Visualising Local R2

tmap_mode("view")

tm_shape(mpsz_sf)+
  tm_polygons(alpha = 0.1) +
tm_shape(flat_resale_sf_adaptive) +  
  tm_dots(col = "Local_R2",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(11,14))

We can observe that majority of the HDB flats have Local R-squared values in the range from 0.6 to 1. This implies that most of the HDB Resale Prices are explained by the GWR Model.

6.0 Conclusion

To recap, we used the following factors to build the hedonic pricing model:

From our study, we found that the majority of these factors affect the resale prices of 4-room flats with a transaction period from 01-Jan-2019 to 30-Sep-2020. Only the factor PROX_CBD is not used in model building since it has high correlation with another factor PROX_GDPRI.

7.0 References